Introduction to data science in R
Lesson 4: For loops


Brian S. Evans, Ph.D.
Migratory Bird Center
Smithsonian Conservation Biology Institute


Setup for the lesson


# Load RCurl library:

library(RCurl)

# Load a source script:

script <-
  getURL(
    "https://raw.githubusercontent.com/bsevansunc/workshop_languageOfR/master/sourceCode.R"
  )

# Evaluate then remove the source script:

eval(parse(text = script))

rm(script)

For loops


Why would you use for loops?

# Filter irisTbl to setosa:

irisTbl[irisTbl$species == 'setosa', ]

# Extract the petalLength field (column):

irisTbl[irisTbl$species == 'setosa', ]$petalLength

# Calculate the mean of petal lengths:

mean(irisTbl[irisTbl$species == 'setosa', ]$petalLength)

Exercise One:


Calculate the mean petal length of each of the Iris species using matrix notation (as above) and a custom function.


Exercise One:


Calculate the mean petal length of each of the Iris species using matrix notation (as above) and a custom function.

# Mean petal lengths, matrix notation:

mean(irisTbl[irisTbl$species == 'setosa', ]$petalLength)
mean(irisTbl[irisTbl$species == 'versicolor', ]$petalLength)
mean(irisTbl[irisTbl$species == 'virginica', ]$petalLength)

# Mean petal lengths, function method:

meanPetalFun <- function(spp){
  mean(irisTbl[irisTbl$species == spp, ]$petalLength)
}

meanPetalFun('setosa')
meanPetalFun('versicolor')
meanPetalFun('virginica')

Indexing review, vectors


Consider the following numeric vector, v:


[1] [2] [3] [4] [5]
1 1 2 3 5

Indexing review, vectors



[1] [2] [3] [4] [5]
1 1 2 3 5

Vector v is an R object comprised of five numbers.

# Explore vector v:

v

class(v)

str(v)

length(v)

Indexing review, vectors


[1] [2] [3] [4] [5]
1 1 2 3 5

Each value in a vector has a position, denoted by “[i]”.

Recall: v[i] is the value of v at position i.

# Explore vector v using indexing:

i <- 3

v[i]

v[3]

v[3] == v[i]

Indexing review, vectors



\[V_{new, i} = V_{i} + 1\]

Each value in a vector has a position, denoted by “[i]”.

Recall: v[i] is the value of v at position i.

# Add 1 to the value of v at position three:

i <- 3

v[3] + 1

v[i] + 1

For loops, simple example



\[V_{new, i} = V_{i} + 1\]

Writing proper for loops requires following these three steps:

  1. Output: Always define an object for storing output (e.g., an empty vector, matrix, or list)
  2. Sequence: The locations for which the loop will run
  3. Body: This is the instructions for what will occur during each iteration of the loop

For loop, output:



\[V_{new, i} = V_{i} + 1\]

ALWAYS specify an object to store your output!

Vector objects are defined as:

# Define a vector for output:

vNew <- vector('numeric', length = length(v))

str(vNew)

For loop, output


ALWAYS specify an object to store your output!

# Explore filling values of vNew by index:

i <- 3

v[i]

vNew[i] <- v[i] + 1

vNew[i]

v[i] + 1 == vNew[i]

For loop, sequence


The sequence can be defined using:

v

1:5

1:length(v)

seq_along(v)

# Example for loop sequence statements:

# for(i in 1:length(v))
  
# for(i in seq_along(v))

For loop, body


The for loop body describes what will happen at each iteration of the loop. For example:

i <- 3

vNew[i] <- v[i] + 1

For loop, putting it together


  1. Output
  2. Sequence
  3. Body
# For loop output:

vNew <- vector('numeric',length = length(v))

# For loop sequence:

for(i in seq_along(v)){
  # For loop body:
  vNew[i] <- v[i] + 1
}

# Explore first for loop output:

vNew

vNew == v + 1

Exercise Two:



\[y = mx + b\]
  1. Convert the above mathematical formula to a function with arguments m, b, and x.

  2. Generate a sequential vector of values containing all integers from 1-10. Assign the name x to the vector object.

  3. Use a for loop and the function above to calculate values of y where: m = 0.5, b = 1.0, and x refers to the vector x above (Note: A for loop is not really required here).

Exercise Two:



\[y = mx + b\]
  1. Convert the above mathematical formula to a function with arguments m, b, and x.

linearModel <- function(m, x, b){
  m*x+b
}

Exercise Two:



\[y = mx + b\]
  1. Generate a sequential vector of values containing all integers from 1-10. Assign the name x to the vector object.

x <- 1:10

Exercise Two:



\[y = mx + b\]
  1. Use a for loop and the function above to calculate values of y where: m = 0.5, b = 1.0, and x refers to the vector x above (Note: A for loop is not really required here).

x <- 1:10

y <- vector('numeric',length = length(x))

for(i in seq_along(x)){
  y[i] <- linearModel(m = 0.5, b = 1.0, x = x[i])
}

Subsetting with for loops


Split-Apply-Combine

# Mean petal lengths of Iris species without a for loop:

mean(irisTbl[irisTbl$species == 'setosa', ]$petalLength)

mean(irisTbl[irisTbl$species == 'versicolor', ]$petalLength)

mean(irisTbl[irisTbl$species == 'virginica', ]$petalLength)

Subsetting with for loops


Split-Apply-Combine


Start by creating a vector of species:

# Make a vector of species to loop across:

irisSpecies <- levels(irisTbl$species)

irisSpecies

Subsetting with for loops


Split-Apply-Combine


Create an empty vector to store our output:

# For loop output statement:

petalLengths <- vector('numeric',length = length(irisSpecies))

petalLengths

Subsetting with for loops


Split-Apply-Combine


Split: The for loop body, starts with splitting the data

# Exploring the iris data, subsetting by species:

i <- 3

irisSpecies[i]

irisTbl[irisTbl$species == irisSpecies[i], ]

# Split:

iris_sppSubset <- irisTbl[irisTbl$species == irisSpecies[i], ]

Subsetting with for loops


Split-Apply-Combine


Apply: Modification of the data:

# Calculate mean petal length of each subset (apply):

mean(iris_sppSubset$petalLength)

Subsetting with for loops


Split-Apply-Combine

# Make a vector of species to loop across:

irisSpecies <- levels(irisTbl$species)

# For loop output statement:

petalLengths <- vector('numeric',length = length(irisSpecies))

# For loop:

for(i in seq_along(irisSpecies)){
  # Split:
  iris_sppSubset <- irisTbl[irisTbl$species == irisSpecies[i], ]
  # Apply:
  petalLengths[i] <- mean(iris_sppSubset$petalLength)
}

Subsetting with for loops


Split-Apply-Combine


Combine: Combining the for loop output

# Make a tibble data frame of the for loop output (combine):

petalLengthFrame <-
  data_frame(species = irisSpecies, count = petalLengths)
  
petalLengthFrame

Exercise Three:


Use a for loop and the birdHabits data frame to calculate the number species in each diet guild.


Exercise Three:


Use a for loop and the birdHabits data frame to calculate the number species in each diet guild.


birdHabits

diets <- unique(birdHabits$diet)

outVector <- vector('numeric', length = length(diets))

for(i in seq_along(outVector)){
  # Split:
  dietSubset <- birdHabits[birdHabits$diet == diets[i],]
  # Apply:
  outVector[i] <- nrow(dietSubset)
}

# Combine: 
data_frame(diet = diets, nSpecies = outVector)

For loops across data objects


For loops can be used to explore data objects with common features.

How many omnivorous birds were observed at each site?

# Explore the bird count data:

head(birdCounts)

str(birdCounts)

# Explore the bird trait data:

head(birdHabits)

str(birdHabits)

For loops across data objects


Example, site == 'apple'


# Extract vector of omnivorous species:

omnivores <- birdHabits[birdHabits$diet == 'omnivore',]$species

# Subset the counts to omnivores:

birdCounts[birdCounts$species %in% omnivores, ]$count

# Calculate the sum of counts:

sum(birdCounts[birdCounts$species %in% omnivores, ]$count)

For loops across data objects


Example, site == 'apple'


# Subset the omnivore counts to site apple:

birdCounts[birdCounts$species %in% omnivores &
             birdCounts$site == 'apple', ]

# Extract the count column:

birdCounts[birdCounts$species %in% omnivores &
             birdCounts$site == 'apple', ]$count

# Calculate the sum:

sum(birdCounts[birdCounts$species %in% omnivores &
             birdCounts$site == 'apple', ]$count)

Exercise Four:


Using the birdHabits and birdCounts data frames, modify the function below such that it will calculate the number of species of a given guild at a selected site.


richnessSiteGuild <- function(site, guild){
  guildSpp <- birdHabits[birdHabits$foraging # COMPLETE
  countSppSubset <- birdCounts[birdCounts$ # COMPLETE
  countSppSiteSubset <- countSppSubset[# COMPLETE
  nSpp <- # COMPLETE
  return(nSpp)
}

richnessSiteGuild('apple', 'ground')

Exercise Four:


Using the birdHabits and birdCounts data frames, modify the function below such that it will calculate the number of species of a given guild at a selected site.


richnessSiteGuild <- function(site, guild){
  guildSpp <- birdHabits[birdHabits$foraging == guild,]$species
  countSppSubset <- birdCounts[birdCounts$species %in% guildSpp,]
  countSppSiteSubset <- countSppSubset[countSppSubset$site == site,]
  nSpp <- length(unique(countSppSiteSubset$species))
  return(nSpp)
}

richnessSiteGuild('apple', 'ground')

For loops across data objects


How many omnivorous birds were observed at each site?

Get a vector of birds that are ground foragers from the birdHabits data frame:

# Extract vector of omnivorous species:

omnivores <- birdHabits[birdHabits$diet == 'omnivore',]$species

For loops across data objects


How many omnivorous birds were observed at each site?

Split the data into individual sites.

# Generate a vector of unique sites:

sites <- unique(birdCounts$site)

# Site at position i:

i <- 3

sites[i]

# Subset data:

birdCounts_siteSubset <- birdCounts[birdCounts$site == sites[i],]

birdCounts_siteSubset

For loops across data objects


How many omnivorous birds were observed at each site?

Split: Use %in% to extract only records associated with omnivores and sum the count field.


# Just a vector of omnivore counts:

countVector <-
  birdCounts_siteSubset[birdCounts_siteSubset$species %in%
  omnivores,]$count

For loops across data objects


How many omnivorous birds were observed at each site?

Apply: Sum the count vector.


# Get total number of omnivores at the site:

nOmnivores <- sum(countVector)

For loops across data objects


How many omnivorous birds were observed at each site?

Combine: Values combined using the vector method

sites <- unique(birdCounts$site)

outVector <- vector('numeric', length = length(sites))

for(i in seq_along(sites)){
  birdCounts_siteSubset <- birdCounts[birdCounts$site == sites[i],]
  countVector <-
    birdCounts_siteSubset[birdCounts_siteSubset$species %in%
    omnivores, ]$count
  outVector[i] <- sum(countVector)
}

# Combine:

data_frame(site = sites, nOmnivores = outVector)

For loops across data objects


How many omnivorous birds were observed at each site?

Combine: Values combined using the list method

sites <- unique(birdCounts$site)

outList <- vector('list', length = length(sites))

for(i in seq_along(sites)){
  birdCounts_siteSubset <- birdCounts[birdCounts$site == sites[i],]
  countVector <-
    birdCounts_siteSubset[birdCounts_siteSubset$species %in%
    omnivores,]$count
  outList[[i]] <- data_frame(
    site = sites[i],
    nOmnivores = sum(countVector))
}

# Combine:

bind_rows(outList)

Exercise Five:


Using the richnessSiteGuild function you created in Exercies Four and the birdHabits and birdCounts data frames, modify the for loop code below to count the number of species that are ground foragers at each site.


sites <- unique(# COMPLETE 

outList <- vector('list', length = # COMPLETE 

for(i in # COMPLETE 
  outList[[i]] <- data_frame(site = sites[i],
  # COMPLETE 
}

bind_rows(# COMPLETE 

Exercise Five:


Using the richnessSiteGuild function you created in Exercies Four and the birdHabits and birdCounts data frames, write a for loop that will count the number of observed species that are ground foragers at each site.


sites <- unique(birdCounts$site)

outList <- vector('list', length = length(sites))

for(i in seq_along(sites)) {
  outList[[i]] <- data_frame(site = sites[i],
  nSpecies = richnessSiteGuild(sites[i], 'ground'))
}

bind_rows(outList)

Simulation with for loops


For loop to generate a vector of numbers based on some mathematical function. For example:


\[n_t = 2(n_{t-1})\]

Simulation with for loops


For loop to generate a vector of numbers based on some mathematical function. For example:


\[n_t = 2(n_{t-1})\]

# For loop output:


n <- vector('numeric', length = 5)

n

# Set the seed value:

n[1] <- 10

n

Simulation with for loops


For loop to generate a vector of numbers based on some mathematical function. For example:


\[n_t = 2(n_{t-1})\]

# For loop sequence:

# for(i in 2:length(n))

Simulation with for loops


For loop to generate a vector of numbers based on some mathematical function. For example:


\[n_t = 2(n_{t-1})\]

Body: For each iteration (example, position 2):

# Exploring the construction of the for loop body:

i <- 2

n[i]

n[i-1]

n[i] <- 2*n[i-1]

n

Simulation with for loops


For loop to generate a vector of numbers based on some mathematical function. For example:


\[n_t = 2(n_{t-1})\]

# Output:

n <- vector('numeric', length = 5)

# Seed:

n[1] <- 10

# For loop:

for(i in 2:5){
 n[i] = n*v[i-1]
}

Exercise Six:


picture of a rabbit

One of my favorite for loops was created by Leonardo Bonacci (Fibonacci). He created the first known population model, from which the famous Fibonacci number series was created. He described a population (N) of rabbits at time t as the sum of the population at the previous time step plus the time step before that:

\[N_t = N_{t-1} + N_{t-2}\]
  1. Create an output vector of 20 numeric values.
  2. Seed the vector with the first two values, 0 and 1.
  3. Use the formula above and your seed vector to generate the first 20 numbers of the Fibonacci number sequence.

Exercise Six:



\[N_t = N_{t-1} + N_{t-2}\]
  1. Create an output vector of 20 numeric values.

fibOut <- vector('numeric', length = 20)

Exercise Six:



\[N_t = N_{t-1} + N_{t-2}\]
  1. Seed the vector with the first two values, 0 and 1.

fibOut[1:2] <- c(0,1)

Exercise Six:



\[N_t = N_{t-1} + N_{t-2}\]
  1. Use the formula above and your seed vector to generate the first 20 numbers of the Fibonacci number sequence.

for(i in 3:length(fibOut)){
  fibOut[i] <- fibOut[i-2] + fibOut[i-1]
}